Role of DNA methylation in the relationship between glioma risk factors and glioma incidence: a two-step Mendelian randomization study

Genetic evidence suggests glioma risk is altered by leukocyte telomere length, allergic disease (asthma, hay fever or eczema), alcohol consumption, childhood obesity, low-density lipoprotein cholesterol (LDLc) and triglyceride levels. DNA methylation (DNAm) variation influences many of these glioma-related traits and is an established feature of glioma. Yet the causal relationship between DNAm variation with both glioma incidence and glioma risk factors is unknown. We applied a two-step Mendelian randomization (MR) approach and several sensitivity analyses (including colocalization and Steiger filtering) to assess the association of DNAm with glioma risk factors and glioma incidence. We used data from a recently published catalogue of germline genetic variants robustly associated with DNAm variation in blood (32,851 participants) and data from a genome-wide association study of glioma risk (12,488 cases and 18,169 controls, sub-divided into 6191 glioblastoma cases and 6305 non-glioblastoma cases). MR evidence indicated that DNAm at 3 CpG sites (cg01561092, cg05926943, cg01584448) in one genomic region (HEATR3) had a putative association with glioma and glioblastoma risk (False discovery rate [FDR] < 0.05). Steiger filtering provided evidence against reverse causation. Colocalization presented evidence against genetic confounding and suggested that differential DNAm at the 3 CpG sites and glioma were driven by the same genetic variant. MR provided little evidence to suggest that DNAm acts as a mediator on the causal pathway between risk factors previously examined and glioma onset. To our knowledge, this is the first study to use MR to appraise the causal link of DNAm with glioma risk factors and glioma onset. Subsequent analyses are required to improve the robustness of our results and rule out horizontal pleiotropy.

Mendelian randomization (MR) studies have provided some evidence to implicate genetically predicted leukocyte telomere length, allergic diseases (asthma, hay fever or eczema), alcohol consumption, childhood extreme obesity, low-density lipoprotein cholesterol (LDLc) and triglyceride levels as causally relevant risk factors for glioma 24 . The underlying biological mechanisms by which these traits causally relate to glioma risk remains to be established.
One approach to understanding the aetiological pathways influencing glioma onset is to exploit the increasing body of molecular phenotype data to examine epigenetic pathways. Epigenetic changes include chemical modifications that do not change the sequence of DNA but can alter gene expression 25 . The most commonly measured form of epigenetic mark is DNA methylation (DNAm), whereby a methyl group (-CH 3 ) is either added or subtracted to a cytosine nucleotide adjacent to a guanine nucleotide within the DNA sequence (cytosinephosphate-guanine [CpG] site) 25 . One method to examine DNAm variation linked to glioma incidence is to undertake an epigenome-wide association study (EWAS) [26][27][28][29][30][31][32][33][34][35][36] . However, most EWAS have been limited by very modest sample sizes or have been undertaken using glioma tumour tissue which are potentially biased through confounding by treatment thus restricting any inferences that can be made with respect to disease aetiology.
As recent studies have reported that DNAm influences glioma-related traits including allergic diseases 37 , telomere length 38 childhood obesity 39 and glioma risk 40 , we sought to assess the causal relationship of DNAm with glioma risk factors identified in a prior study 24 (Table 1) and glioma incidence using two-step MR 41 . We used a recently published catalogue of germline genetic variants robustly associated with DNAm variation in blood, namely methylation quantitative trait loci (mQTL) 42 , as a proxy for DNAm variation in blood, rather than measuring DNAm variation directly. As glioma is a disease with a high degree of heterogeneity, with differing genetic profiles both intra-and inter-tumourally 43 , we performed a subtype analysis by splitting the glioma outcome data into glioblastoma or non-glioblastoma. An overview of the research questions can be found in Fig. 1.

Results
Does DNAm causally influence both glioma risk and glioma risk factors? Using the full summary statistics for the 232,476 CpG sites (n = 32,851) reported in GoDMC, instrumental variables (IVs) were constructed (P < 5 × 10 −8 and r 2 < 0.001) to act as a proxy for 42,659 CpG sites that could be used in a two-sample MR framework.
Two-sample MR was used to investigate the potential causal effect of DNAm variation at 42,659 CpG sites and glioma risk. For glioma risk there was MR evidence for 284 CpG-glioma effects that met the false discovery rate (FDR) correction threshold (< 0.05). MR results that met the FDR threshold can be found in Appendix 1.1. F-statistic calculations indicated that all 284 CpG sites linked to glioma had an F-statistic > 10 (Appendix 1.2) which suggests that the MR estimate was less likely to be affected by weak instrument bias. www.nature.com/scientificreports/ As a sensitivity analysis, colocalization was used to establish the probability that DNAm and glioma were driven by the same causal variant at each locus. In the colocalization analyses, we found suggestive evidence (H 4 > 70%) that DNAm at 3 of the 284 CpG sites and glioma were driven by the same genetic variant. Next, we examined the directionality of DNAm at the 3 CpG sites and glioma risk using the Steiger filtering method: the 3 CpG sites showed evidence that the direction of effect was methylation influencing glioma risk (Fig. 2). Complete results from both MR and sensitivity analysis are summarised in Table 2. In the subtype analysis, there were 209 CpG-glioblastoma (F-statistic > 10) MR estimates that met the FDR correction threshold (FDR < 0.05) (Appendix 1.3). 3 CpG-glioblastoma associations showed evidence of colocalization and all 3 CpG sites showed evidence that the direction of effect was methylation influencing glioblastoma risk (Fig. 2). The full MR results and results from each sensitivity analysis is summarised in Table 3.
Appraising the causal role of DNA methylation on glioma risk factors. We performed two-sample MR to examine the causal role of DNAm variation at the 3 CpG sites altering risk of glioma or glioma subtypes with glioma risk factors. The results from the extensive analysis are present in Table 4. We identified 5 associations that survived the FDR corrected p-value threshold (p-adjusted < 0.05). Two of these associations were robust to colocalization and Steiger filtering. The results indicate that DNAm variation at cg05926943 and cg01561092 are associated with an increase in telomere length (OR 1.12, 95% CI 1.08-1.15, p-adjusted 3.90 × 10 − 11 : OR 1.04, 95% CI 1.03-1.06, p-adjusted 8.96 × 10 − 7 ), respectively (Fig. 3).
Overlap with gene expression. DNAm variation at the 3 CpG sites (cg01561092, cg05926943, cg01584448) found to putatively influence glioma and glioblastoma risk were used to investigate hypothesis driven tissuespecific effects. We hypothesised that DNAm that influences glioma and glioblastoma risk may be influenced by gene expression in blood and brain tissue. All 3 CpG sites were annotated to the gene HEATR3 (Ensemble ID ENSG00000155393).
To evaluate the association of gene expression with glioma and glioblastoma risk at HEATR3 in blood tissue, instruments were constructed using eQTLGen Consortium (n = 31,684).
When comparing the DNAm MR results with the gene expression MR results, the direction of effect estimated for HEATR3 is consistent with cg01584448. The direction of the estimated effect for the two CpG sites (cg01561092, cg05926943) was discordant with gene expression (Fig. 4).

DNA methylaƟon Risk factor Glioma
Risk factor DNA methylaƟon Glioma Two-step MR Step 1 SNP Risk factor DNA methylaƟon G lioma Step 2 mQTL Two-step MR mQTL DNA methylaƟon Glioma DNA methylaƟon R isk factor Glioma Step 2 mQTL Step 1 Risk factor DNA methylaƟon Glioma CpG sites that showed robust evidence of a causal role on glioma risk. Forest plot of CpG sites that showed robust MR evidence of an association with glioma or glioblastoma and colocalized with glioma or glioblastoma. OR, per standard deviation change in genetically proxied DNA methylation; 95% CI, 95% confidence intervals; p-adjusted, p-value adjusted for FDR for the observed effect.  www.nature.com/scientificreports/ To establish if the associations between the CpG sites and glioma is mediated by changes in gene expression at HEATR3 in blood tissue we applied "moloc". Moloc assessed the likelihood that DNAm, gene expression and glioma susceptibly are driven by the same causal variant. The results indicated suggestive evidence (PPA > 70%) of colocalization between gene expression and glioma (but not by DNAm at cg01561092). Similarly, colocalization between DNAm and glioma at cg05926943 was observed but not with gene expression. The results provided evidence of two distinct causal variants for methylation and expression at cg01584448 (Table 6).
Next, to establish if there was an association between gene expression and glioma or glioblastoma risk at HEATR3 in brain tissue, instruments were constructed using data from GTEx v8 (n = 1194).
The two associations from the MR analysis survived the FDR corrected p-value threshold, however, neither showed evidence of colocalization suggesting the MR result may be biased by genetic confounding. The results from the extensive analyses are provided Table 7. The MR estimates of CpG methylation on glioma, glioblastoma and telomere length. The association between that associated with glioma, glioblastoma and telomere length; OR (95%) is the effect of DNAm on glioma, glioblastoma and telomere length. MR effect estimates are reported as odds ratios (95% confidence intervals (CI)) per 1 standard deviation change in genetically proxied DNAm.

Discussion
Extensive analyses were conducted to establish the role of DNAm on the causal pathway leading to glioma onset. MR evidence robust to the FDR p-value threshold and Steiger filtering identified 3 CpG sites (cg01561092, cg05926943, cg01584448) in one genomic region (HEATR3) that have a putative association with glioma and glioblastoma risk. In support of these findings, MR provided evidence that higher levels of gene expression of HEATR3 in blood tissue was associated with an increased risk of glioma and glioblastoma. MR provided little evidence to suggest any CpG sites influenced non-glioblastoma. By examining the role of DNAm variation at these 3 CpG sites with putative glioma related traits (alcohol consumption, allergic disease, childhood obesity, LDL cholesterol, triglycerides, and telomere length), we report evidence that 2 of these CpG sites (cg01561092, cg05926943) influenced telomere length. MR offered little evidence to suggest that DNAm acts as a mediator on the causal pathway between glioma related traits previously examined and glioma onset. Higher levels of methylation at cg01584448 were associated with an increase in glioma and glioblastoma risk. Whereas higher levels of methylation at cg5926943 and cg01561092 were associated with a lower risk of glioma and glioblastoma. To elucidate the observed putative association, the CpG sites were annotated to their closest gene. As the CpG sites reside in close genomic positions they were mapped to the same gene, a known oncogene, HEATR3, which has been associated with glioma risk in previous studies [44][45][46] ; thus, providing evidence that the genomic region is relevant. Here MR, colocalization and Steiger filtering offered further evidence that differential gene expression of HEATR3 within blood tissue increased the risk of glioma and glioblastoma. A conflicting  Table 5. The MR results for the analysis of differential gene expression in blood tissue with glioma and glioblastoma risk. P-adjusted, p-value adjusted for FDR. MR effect estimates are reported as odds ratios (95% confidence intervals (CI)) per 1 standard deviation change in genetically proxied differential gene expression. SNP, single nucleotide polymorphism. Due to the complex nature of this interaction between DNAm and gene expression, moloc was implemented to establish if glioma, DNAm and gene expression shared a common causal genetic variant, to provide further supporting evidence of an underlying causal association between these traits rather than findings being driven through genetic confounding (e.g., LD between an mQTL and a variant influencing glioma risk). The results from the moloc analysis indicated that gene expression colocalizes with glioma but not with DNAm at cg01561092. Similarly, colocalization between DNAm and glioma at cg05926943 was observed but not with gene expression. There was evidence of two distinct causal variants for methylation and expression at cg01584448. There is evidence of colocalization between two of the traits at each CpG site (gene expression and glioma risk; methylation www.nature.com/scientificreports/ and glioma risk) thus it is possible that gene expression is under the control of methylation of a region rather than specific CpG sites. The incidence and mortality of high-grade glioma increases with age, with the median age at diagnosis of 64 years 49 . The 3 CpG sites putatively associated with glioma risk in this study have been linked to age in previous EWAS 50 . Age-specific differences in glioma susceptibility could reveal clues about glioma aetiology. Additionally, previous models of age, based on DNAm have demonstrated an ability to predict the risk of both disease and survival in pre-cancerous tissue, including brain tissue [51][52][53] . These findings provide a rationale to evaluate whether an association exists between these epigenetic markers and age at diagnosis in glioma and subsequently whether DNAm can act as a prognostic marker.
Prior epidemiological studies have reported that longer leukocyte telomere length is linked to an increased risk of glioma 24,54 . Here, we provide evidence to further elucidate the molecular mechanism between telomere length, DNAm and glioma risk. Contrary to previous studies, we observed evidence that DNAm influencing the CpG sites (cg01561092, cg05926943) decreased glioma risk and increased leukocyte telomere length. The conflicting correlation could be a result of the complexity of the association underlying glioma development. A noteworthy concern is that since methylation was studied in blood tissue, which is unlikely to accurately proxy DNAm in the brain, the associations may be biased by confounding by tissue heterogeneity.
There was little evidence to suggest the glioma related traits influence cancer development through DNAm. These null results could reflect the fact that DNAm is not a causal mediator between these traits and glioma onset, or it could be a consequence of this MR study being underpowered since the variance explained by the IV for the trait was limited. In an attempt to reduce weak instrument bias, we obtained the summary data to proxy the Table 6. The results from the moloc analysis. Significant values are in bold. The columns provide the posterior probability (PPA) for each colocalization scenario where E = eQTL, DM = mQTL, G = trait. The trait is provided in the first column. A PPA > 0.7 was used as suggestive evidence for that scenario and PPA > 0.8 was used as strong evidence.  Table 7. The Mendelian randomization results for the analysis of differential gene expression in brain with glioma and glioblastoma risk. P-adjusted, p-value adjusted for FDR. MR effect estimates are reported as odds ratios (OR) (95% confidence intervals (CI)) per 1 standard deviation change in differential gene expression.  www.nature.com/scientificreports/ glioma related traits from GWAS with a large sample size to improve the reliability of the causal estimates and we only used SNPs with an F statistic greater than 10. An important consideration in the interpretation of this analysis is explained in detail by Min JL et al. 42 . The blood measured mQTL data utilised in this chapter, obtained from the GoDMC data set 42 , cannot be regarded as mediating the genetic association to the trait even when there is colocalization evidence of a shared genetic variants. Rather, when DNAm shows evidence of colocalizing with a complex trait, such as glioma and telomere length, then this is likely due to common cause. Therefore, despite CpG sites showing evidence of colocalization, it is possible that second instrumental variable assumption has been violated, as there could be a common cause for both DNAm and glioma risk. To establish if the CpG sites identified here are truly implicated in glioma onset more detailed analyses are required to triangulate evidence and to fully understand the mechanistic pathways.
Another limitation of this study is the fact that we used single-instrument MR to examine causal relationships and consequently was not properly able to appraise possible horizontal pleiotropic effects. We took measures to minimise this possibility: instruments were limited to cis-mQTLs as trans-mQTLs are more likely to have effects on methylation and glioma risk via distinct mechanisms; and colocalization techniques were implemented to test whether the putative causal variant is shared by the exposure (e.g., risk factor or DNAm) and the outcome (e.g., glioma or DNAm) 55-57 thus increasing the probability that the two traits have a shared causal mechanism 55,58 .
Despite these limitations, this analysis has numerous strengths, including the use of two-sample MR to examine the causal role of DNAm in glioma risk by exploiting a vast epigenetic resource and the largest glioma GWAS. Thus, leading to increased statistical power and precision of effect estimates. Furthermore, to ensure IVs were valid, genetic instruments were constructed using a strict inclusion criteria and quality control steps were undertaken. For example, only cis-variants were included and instrument strength was checked. In addition, the orientation of the causal effect was inferred to reduce the likelihood of reverse causation.

Methods
Reported results from all analyses are MR effect estimates that met either the false discovery rate (FDR) threshold (when DNAm or gene expression is the exposure) or the Bonferroni-corrected p-value threshold (glioma related traits is the exposure), showed evidence of colocalization 59 to rule out genetic confounding, and displayed little evidence to suggest reverse causation through Steiger filtering (Fig. 2) 60 . All MR analyses were conducted using the "TwoSampleMR" package in R studio (version 4.1.0) using the computational facilities of the Advanced Computing Research Centre, University of Bristol (http:// www. brist ol. ac. uk/ acrc/).
When DNAm or gene expression were instrumented as the exposure, we opted to use a more liberal FDR corrected p-value threshold, as we did not expect complete independence of all statistical tests (within overall glioma, glioblastoma, or non-glioblastoma analyses), compared to the Bonferroni p-value threshold used, when a glioma related trait was instrumented as the exposure.
Mendelian randomization estimate. In cases where there was a single nucleotide polymorphism (SNP) to act as a proxy for the exposure of interest (e.g., DNAm), the causal effect estimates from MR were calculated using the Wald ratio (β GD /β GP ) 61 and standard errors approximated using the delta method 62 . Where the exposure (e.g., DNAm variation at a CpG site) was instrumented by multiple independent SNPs (r 2 < 0.001), causal effect estimates were calculated using the random effects inverse variance weighted (IVW) method to allow overdispersion, where the Wald ratios were combined into a single causal estimate by meta-analysis 63 .
Colocalization. IV2 violations can occur through genetic confounding if genetic variants are correlated through linkage LD (Fig. 3). Therefore, for associations which met the p-value threshold (FDR < 0.05) we applied pairwise conditional and colocalization (PWCoCo) 57 to determine whether the genetic variant associated with the exposure, e.g., DNAm, was the same genetic variant altering the outcome e.g., glioma (i.e., as identified in glioma genome wide association study [GWAS]), thus permitting evaluation of the presence of genetic confounding 64 . Colocalization requires providing prior probabilities that any random SNP within the genomic region of interest is associated with the exposure, the outcome or both (p1 = 1e−4, p2 = 1e−4, p12 = 1e−5). SNPs from a ± 250KBP window were extracted around the instrumented SNP(s) for each putative causal SNP from the exposure and outcome GWAS. A posterior probability for H 4 > 0.8 was designated as "strong" and 0.7 > a posterior probability for H 4 < 0.8 as "suggestive" evidence.
Directionality test. To increase the likelihood that MR infers the correct causal direction between the exposure (e.g., DNAm) and the outcome (e.g., glioma), we applied the Steiger filtering method to test for reverse causation 60 . Steiger filtering removes SNPs that explain more of the variance in the outcome than the exposure and therefore the MR estimate is less likely to biased by misspecification in the MR model. Steiger filtering was performed for the putative causal variants identified in the MR analysis that showed evidence of colocalization.

Hypothesis 1.
A summary of the research questions addressed in hypothesis 1 is displayed in Fig. 6.
Step 1: evaluating the relationship between DNA methylation and glioma risk. Instrument selection. Two-sample MR was implemented to ascertain the potential causal effects of circulating DNAm on glioma risk. To create genetic IVs for DNAm as the exposure we used effect estimates for germline cis-SNPs (SNPs within a ± 250KBP window of the CpG site) robustly associated with DNAm at CpG site (mQTL) at genome wide significance (P < 5 × 10 -8 ) 42 that had undergone LD clumping (r 2 < 0.001) from the mQTL database Genetics of DNA Meth- Outcome selection. For the glioma outcome, summary data were obtained from a GWAS meta-analysis of 12,488 glioma cases and 18,160 controls 66 . MR analyses were performed to assess the causal impact of DNAm variation on glioma subtypes: glioblastoma (6,183 cases) and non-glioblastoma (5,820 cases).
Mendelian randomization effect estimate and p-value threshold. MR effect estimates are reported as odds ratios (OR) (95% confidence intervals (CI)) per 1 standard deviation (SD) increase in genetically proxied DNAm. www.nature.com/scientificreports/ Step 2: evaluating the relationship between DNA methylation and glioma related traits. Instrument selection. As described above, IVs for DNAm were generated (r 2 < 0.001, P < 5 × 10 -8 ) for CpG sites associated with either glioma, glioblastoma, and/or non-glioblastoma in step 1 above.
Outcome selection. For the outcome, summary data for the putative glioma related traits 24 (genetically predicted leukocyte telomere length, allergic disease, alcohol consumption, childhood extreme obesity, LDLc and triglyceride levels) was obtained from MR-Base (a curated data base that contains complete GWAS results) 67 (Table 9).
Follow up tissue-specific Mendelian randomization analysis.. For the CpG sites that showed robust evidence of an effect with glioma risk, we investigated whether variation in tissue-specific gene expression was responsible for the effect with glioma risk. For the analysis we utilised blood tissue by incorporating gene expression data from the eQTLGen Consortium (n = 31,684) (https:// www. eqtlg en. org/) 68 and brain tissue utilising gene expression data from 13 brain tissues from The Genotype-Tissue Expression project (GTEx) v8 (n = 1194) 69 . CpG sites were annotated to genes using the R package meffil 70 . IVs for genes were constructed using effect estimates for germline cis-SNPs (within a ± 250KBP window) associated with gene expression variation in brain and blood, namely expression quantitative trait loci (eQTLs) at genome wide significance (P < 5 × 10 -8 ) 42 that had undergone LD clumping (r 2 < 0.001). To measure instrument strength, we examined the variance in gene expression explained by the eQTLs (R 2 ) and the F statistic 65 .
Multiple trait colocalization. For genes that appeared to overlap with the CpG sites of interest we applied multiple trait colocalization (moloc) 71 to investigate whether the same genetic variant influences proximal DNAm, proximal gene expression and glioma risk. Such analyses can provide evidence to support gene expression and DNAm residing on the same causal pathway to glioma onset 72 . We implemented "moloc" using data from three different data sources: DNAm data from the mQTL database GoDMC [http:// www. godmc. org. uk/] (n = 32,851) 42 , gene expression data from the eQTLGen Consortium (n = 31,684) (https:// www. eqtlg en. org/) 68 and GWAS meta-analysis data for glioma 66 . Moloc default prior probabilities were implemented (p1 = 1 × 10 -4 , p2 = 1 × 10 -6 and p3 = 1 × 10 -7 ), p1 was used for one association, p2 for two associations, and p3 for colocalization of all three associations. We examined colocalization with expression of all genes with a ± 250KBP window of the CpG site of interest. At least 50 variants (minor allele frequency [MAF] > 0.05) common to all three datasets were required for the analysis. A posterior probability of greater than 70% was considered suggestive evidence of colocalization. All analyses were undertaken in R version 4.1.0.

Hypothesis 2.
A summary of the research questions addressed in hypothesis 2 is displayed in Fig. 7.
Step 1: evaluating the relationship between glioma related trait and DNA methylation. Genetic instruments for the glioma related traits were collated from MR-Base 67 or directly from the relevant GWAS (details of studies used to obtain genetic instruments are given in Table 10).  www.nature.com/scientificreports/ Genetic instruments were created using SNPs with an F statistic equal to or greater than 10, shown to be robustly (P < 5 × 10 − 8 ) and independently (r 2 < 0.001) associated with the glioma related trait under examination in individuals of European ancestry.
Mendelian randomization estimate and p-value threshold. The MR estimate was expressed as SD increase in methylation per unit increase in the glioma related trait. A Bonferroni-corrected p-value threshold, P value < 0.0083 (0.05/6 as there were 6 traits included in the analysis), was used to evaluate the strength of the statistical evidence.
Step 2: evaluating the relationship between DNA methylation associated with glioma related traits and glioma risk. Using IVs for the CpG sites that were influenced by putative glioma related traits, we examined if DNAm variation at these CpG sites had an MR effect on glioma risk using the glioma GWAS (12,488 glioma cases and 18,160 controls) 66 . MR effect estimates are reported as the OR (95% CI) per 1 SD increase in genetically proxied DNAm.
Ethics approval and consent to participate. Ethical approval was not required for this specific analysis as the entirety of the data was sourced from the summary statistics of a published GWAS and no individual-level data were used.

Data availability
Genetic instrument for DNAm can be obtained from the mQTL database GoDMC [http:// www. godmc. org. uk/] (n = 32,851). Genetic instruments used to proxy the six risk factors can be found through MR-Base (http:// www. mrbase. org/) or from the individual reference papers. Meta-analysed glioma GWAS data were acquired from the study by Melin et al. 66 ., which is a meta-analysis of eight independent GWAS studies (UK 73 , French 74 , German 75 , MDA 44 , UCSF-SFAGS 44 , GliomaScan 76 , GICC 64 and UCSF/Mayo 77 ). Genotype data from the Glioma International Case-Control Consortium Study GWAS are available from the database of Genotypes and Phenotypes (dbGaP) under accession phs001319.v1.p1. Genotypes from the GliomaScan GWAS can be accessed through dbGaP accession phs000652.v1.p1.